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m Abstract 

The skew-normal and the skew-t distributions are parametric families which are cur- 
rently under intense investigation since they provide a more flexible formulation com- 
pared to the classical normal and t distributions by introducing a parameter which reg- 
ulates their skewness. While these families enjoy attractive formal properties from the 
probability viewpoint, a practical problem with their usage in applications is the possib- 
ility that the maximum likelihood estimate of the parameter which regulates skewness 
diverges. This situation has vanishing probability for increasing sample size, but for fi- 
nite samples it occurs with non-negligible probability, and its occurrence has unpleasant 
effects on the inferential process. Methods for overcoming this problem have been put 
forward both in the classical and in the Bayesian formulation, but their applicability is 
restricted to simple situations. We formulate a proposal based on the idea of penalized 
J> likelihood, which has connections with some of the existing methods, but it applies more 

VO generally, including in the multivariate case. 

CO 

Some key-words: anomalies of maximum likelihood estimation, boundary estimates, penal- 
ized likelihood, skew-elliptical distributions. 
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1 Skew-normal distribution: inferential issues 



1.1 Background 

A currently active stream of literature deals with a set of probability distributions whose 
most prominent representative is the skew-normal distribution, whose density function in the 
scalar case is 
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where <f> and $ denote the N(0, 1) density and distribution function, respectively. The skew- 
normal density depends on parameters £, u (with u > 0) and a, which regulate location, 
scale and shape, respectively. If Y is a random variable with density function (1)| we shall 
write Y ~ SN(£,a; 2 ,a). When a = 0, we return to the regular normal distribution N(£,w 2 ); 
otherwise the distribution is positively or negatively asymmetric, in agreement with the sign 
of a. 



The basic construction (1) can be extended in several directions, to various levels of gen- 
erality, leading to a much broader set of distributions; the terms skew-elliptical and skew- 
symmetric distributions are usually adopted in this context. We shall introduce some of these 
other constructions in the course of the paper where appropriate, specifically its multivariate 
version and the closely-related skew-t distribution. For a general overview of the subject, 
we refer the readers to the book edited by Genton ( 2004 ) and the review paper of Azzalini 
(2005); a concise account on the skew-normal distribution, including its multivariate version, 
is given by Azzalini] ( |201ip . 



Much of the appeal of distribution |(1)| comes from its mathematical tractability and from 
a number of formal properties which either replicate or at least resemble those of the normal 
distribution, so that they support the adoption of the name 'skew-normal'. These properties 
are discussed at length in the above-quoted references and we do not dwell into this aspect 
which is outside the scope of the present paper. 

The statistical side of the treatment of |(1)| shows instead two peculiar features which call 
for special treatment if one wants to use this distribution in data analysis. Given a random 



sample with components independently drawn from |(l)| the first of these problematic aspects 
refers to the specific value a = 0, and it shows up in a few intimately related manifestations, 
all originated by the proportionality of the score functions for £ and a to each other. The 
main implications of this fact are that, at a = 0, (i) for any sample, the profile log-likelihood 
function for a has an inflection point, (ii) the expected information matrix is singular, even if 
the distribution is identifiable. 

This singularity issue has given rise to much concern, being often perceived as a major 
structural problem of the skew-normal family of distributions, while it is only a problem of the 
adopted parameterization. Moving from (£,w,a) to the 'centred parameterization' proposed 
by [Azzalini ( 1985 ), essentially the cumulants up to the third order with the third one in 
standardized form, removes all these issues. For a more extended discussion of this point and 
for other relevant references, see §2.4 of Azzalini (2005). For a multivariate version of the 



centred parameterization, see Arellano-Valle & Azzalini (2008). 



1.2 MLE boundary values 

The present paper deals instead with the second one of the two peculiar aspects mentioned 
above, represented by the fact that, with non-zero probability, the maximum likelihood estim- 
ate (MLE) of a diverges. The problem is easily examined in the one-parameter case SN(0, 1, a) 
where the log-likelihood based on a random sample z = (z\, . . . , z n ) is 

n 
1(a) = constant + \, Co(« z i) (2) 

i=l 

where Co(^) = log{2 $(»)}. Since (o is a monotonically increasing function, it is then imme- 
diate that the maximum of £ is at a = oo when all Z{ > 0, and it is at a = — oo if all Zi < 0, 



as noted by |Liseo| ( |199Qp . Therefore, if the data have all equal sign, their actual location is 
irrelevant. The value a = oo corresponds to the half-normal or \ distribution; if a = — oo the 
X distribution is mirrored on the negative axis. 

Further, it is only when all sample values have the same sign that we get a divergent MLE, 
since it can be shown that, when there are observations with opposite sign, the MLE is finite 



( [Martinez etaL] [2008 ) . 



Taking into account the known fact ¥{Z < 0} = \ + it 1 arctana, when Z ~ SN(0, 1, a), 
the probability of a divergent MLE is immediately written as 

/l arctana\ n (\ arctana 

Pn,a = 7T + o + 



7T J \2 IT 

This probability goes rapidly to as n — > oo, provided \a\ < oo, but for small or moderate 
sample size it can be non-negligible, especially if a is far from 0. To get an idea, consider that 
P25,5~ 0.197 and p 50 ,5~ 0.039. 

In the three-parameter case SN(£, oo 2 , a), infinite values of the MLE can occur as well, but 
a characterization of the samples leading to such estimates has not been obtained, as far as 
we know. It is convenient to illustrate this case with the aid of a numerical example, and we 
make use the so-called 'frontier data', presented by Azzalini & Capitanio ( [1999] ), which is a 



set of n = 50 values sampled from SN(0, 1, 5). For these data, the MLE a of a diverges when 
one assumes a three-parameter SN(£, u 2 , a) family of distributions. 

The left panel of Figure [T] displays these data together with their histogram and two fitted 
curves: one corresponds to the MLE, another one is a non-parametric kernel-type estimate, 
using a Gaussian kernel with bandwidth chosen by cross-validation, and the third curve will 
be described later on. Since a = oo, the latter curve is a shifted and scaled x distribution, 
with origin just below the smallest sample value. The right-side panel of this figure displays 
the deviance function 

D(a) = 2 {logL*(a) - logL*(a)} 

where L*(a) denotes the profile likelihood for a. The curve, which appears to be monotonic- 
ally decreasing, becomes flat for large a. 

As mentioned earlier, the inclusion of the limiting points a = ±oo in the parameter space 



is admissible for distribution |(1)| However, a = ±oo represent a peculiar situation, not only 
because we are at the boundary of the parameter space, but in addition the support of the 
distribution collapses to the half-line, instead of the complete real line as for any finite a. 




Figure 1: Frontier data. Left panel: rug-plot and histogram with superimposed MLE fit 
(continuous line), non-parametric fit (dashed line) and MPLE fit (dot-dashed line). Right 
panel: deviance function of a. 



When an unbounded estimate, a = ±00, occurs there are two alternative aptitudes of a 
statistician. One is to say: if the MLE is a = 00, we still take it; after all, this is an admiss- 
ible value of the parameter. Notice however that, on the boundary on the parameter space, 
standard asymptotic distribution theory of MLE does not hold, and a special theory must 
be developed to obtain standard errors of the estimates. The other aptitude is to disregard 
a = 00 as an anomaly of MLE. Not only this parameter point is peculiar for the general reas- 
ons indicated earlier, but in addition it often does not appear to actually describe the data in 
a satisfactory way. For instance, in the case of Figure [T] neither the histogram nor the non- 
parametric density estimate exhibit the extreme pattern in the data which are implied by the 



MLE value. Furthermore, as remarked by |Azzalini & Capitanio| ( [1999| ), the sample index of 
skewness of the data, 0.902, is well inside the admissible range of 71, about ±0.99527, whose 
extremal values correspond to a = ±00. 

Yet another argument against the MLE choice is provided by the plots in Figure [2] which 
displays the behaviour of the three MLE components when the minimum sample value, 
—0.1032, is replaced by another value, m say, which ranges from —0.20 to —0.10. While 
in the left panel £ and Cj are very stable as m moves along the range, the evolution of a in 
the right panel has a dramatic discontinuity. When m varies from —0.2 to —0.152, a increases 
gradually from about 12 to about 40, but atm = —0.151 it jumps above 7200. This value is 
however only where the numerical optimization procedure was stopped searching, but the 
search would lead to increasingly large values if it was left running, although the divergence 
of a corresponds to a negligible increase of the log-likelihood function, as indicated by the 
right panel of Figure [T] Such a severe instability of an estimator in reaction to this minute 
variation of a single sample value is unacceptable on general grounds. 



-0.16 -0.14 
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Figure 2: Frontier data: evolution of the MLE components when the minimum sample value 
ranges —0.20 to —0.10. The left panel refers to £ (bottom curve) and Cj (top curve); the right 
panel refers to a. The bullets denote the estimates using the original sample minimum. 



1.3 Alternative options 

There appear to exist both general arguments and numerical evidence against the MLE solu- 
tion, at least when it leads to boundary values of a. Notice that the problem cannot be cured 
by reparameterization, like for the singular information matrix, since any regular transform- 
ation maps boundary points of the parameter space (£, u>, a) into boundary points of the new 
parameter space. In addition, if a = ±oo, the equivariance property of MLE would lead 
to take the transformed boundary point as the new MLE. Therefore a different estimation 
method need to be considered. 

A number of alternative proposals have been put forward, adopting a range of different 



approaches. A preliminary solution to the problem has been put forward by Azzalin i & Cap- 
( 1999| ) in the discussion following the presentation of the frontier data. This is based 



itanio 



on the consideration that the log-likelihood function varies little over a large span of the a 
axis; see the right panel of Figure [T] for a visual perception at leas of the profile version of the 
log-likelihood. It is then reasonable to take a value whose log-likelihood is below the max- 
imum by a non-significant amount. While this technique works well in practice and it can 
applied also to a variety of similar problems, it leaves some arbitrary margin on the choice of 
the acceptable amount of drop from the maximum. 

Sartori ( ]2006 ) has specialized the general bias-reduction method of Firth ( 1993| ) to the 
present context. This technique replaces the usual likelihood equation I' (a) = by the 



modified form 



I' (a) + M(a) = 



and the correction term M(a) in the case Z ~ SN(0, 1, a) takes the form 
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a p (a)=E{Z p ( 1 (aZ) 2 } 



(3) 



(4) 



where Ci(^) = Co( x ) i s trie inverse Mills ratio. Sartori shows that, for any sample, the mod- 
ified likelihood equation has at least one finite solution. An interesting feature is the close 
similarity of the shape of M(a) with the derivative of the logarithm of Jeffreys' uninformat- 
ive prior. Since the three-parameter case SN(£, oj 2 , a) is hard to tackle via the general Firth's 
method, Sartori introduces a specifically constructed two-step scheme. 

In a Bayesian framework, Liseo & Lope rfido ( 2006| ) adopt the Jeffreys prior for a, which 
they prove to be a proper distribution over the real line. For the thee-parameter case, an 
expression of the reference-integrated likelihood is obtained, although this is difficult to use 
for n not small. Follow-up work has been done by Bayes & Branco ( |2007| ) whose development 
includes a closed-form approximation 
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(5) 



which is based on replacing the normal distribution function entering in the expression of 
a p (a) by a rescaled logistic distribution. The subsequent simulation study confirms the close- 
ness of the Sartori-Firth estimate to the Jeffreys' posterior mode. 



An alternative route has been taken by Greco ( 2011 ) using a minimum Hellinger distance 
criterion, which also leads to finite estimates of a for the case SN(0, l,a). This approach 
works for the three-parameter case as well, without introducing special adaptation. However 
it involves the choice of a specific density estimate and of the connected smoothing parameter, 
which influences the final outcome. 

The above-recalled constructions lead to elegant results for the basic case SN(0, 1, a), but 
the three-parameter case SN(£,w 2 ,a) already poses non-trivial additional difficulties for the 
first two of them. The multivariate case has not been tackled at all, as far as we know, except 



for a very brief mention of |Greco| ( |2011| ) . The analogous problem with the skew- normal 
distribution replaced by the skew-t distribution is inevitably more complex, because of one 
additional parameter involved and the diminished mathematical tractability; we shall review 
the existing results for the univariate case in Section [3] 

The aim of the rest of the paper is to develop a procedure which can be applied to a 
range of situations, including the multivariate case, with the requirement that its behaviour is 
largely the same of the MLE, with only a minor modification to prevent boundary estimates. 
We first develop our proposal for the skew-normal distribution, and later extend it to the 
skew-t distribution. 



2 Penalization of the log-likelihood function 



2.1 General remarks 

Penalization of the log-likelihood function is a device which has been adopted in a number of 
problems to correct some undesirable behaviour of the regular MLE. Sartori| ( [2006 p. 4262) 
has remarked that (3)| can be viewed in this light. 

Our aim is to avoid divergent estimates of a, in a formulation applicable to a wide range 
of situations of the context described earlier. To this end, consider a function of the form 



l p {6) = £(9) - Q 



(6) 



where 1(9) denotes the log-likelihood function for 9 which denotes the whole set of para- 
meters associated to the chosen parametric family, and Q represents a non-negative quantity 
which penalizes the divergence of a and it remains O p (l) as n increases. A value 9 which 
maximizes £ p (9) will be called a Maximum Penalized Likelihood Estimate (MPLE). 

The log-likelihood functions which we have in mind are primarily of skew-normal type, 
and related ones discussed in Section [3] but part of the development can potentially be of 
interest also in other settings. It is assumed that £(9) satisfies the standard conditions for 
consistency and asymptotic normality of the regular MLE, 9, as set for instance in Theorem 
5.2.2 of|Sen& Singer] ([T993]). 



Besides the univariate distributions SN(0, 1, a) and SN(£, w 2 , a), we consider also the mul- 
tivariate skew-normal distribution SN^(^, f2, a) whose density function is 

2(j) d (x-t;n)$(a T uj- 1 {x-0) , xeR d , (7) 

where 4>d(x; O) denotes the Nd(0, f2) density function and ui is a diagonal matrix formed by the 
standard deviations of 0; in this case a and £ are d-dimensional parameters. For these three 



parametric families, the parameter 9 in (6) has 1 or 3 or d(d + 5)/2 components, respectively. 
The translation into mathematical notation of the above-indicated requirements for Q is 
that 

Q>0, Q\ a=0 = 0, lim Q = oo (8) 

where a, is the j-th component of a, for j = 1, . . . , d. For the SN distribution, and the ST 
distribution to be discussed later, log L does not diverge to +oo even when the MLE of a 



diverges; combining this fact with the third requirement in |(8)| we are ensured that | (6) | has 
a finite maximum in the interior of the parameter space. In a different context where some 
components of the MLE can diverge but the log-likelihood itself is bounded from above, the 



same argument applies provided the third condition in |(8)| is suitably adapted to the different 
parameter set. 

In the next sections Q will be a function of the parameters only, not depending on the data. 
This condition could be removed as long as Q is O p (l); however, the mathematical treatment 
would be more elaborate and for simplicity we do not consider this case in detail. For the 
subsequent development we also require that Q is twice differentiable with respect to 9, and 
that Q"(9) is a uniformly continuous mapping in a neighbourhood of the true parameter 
point. An additional sensible requirement, although not necessary for our construction, is 
that Q increases monotonically with each |ay|. 

Besides existence of 9, another implication of this formulation concerns the first-order 
asymptotic distribution of 9 when a random sample of size n is available: as n — > oo, this 
asymptotic distribution coincides with the one 9. This fact is intuitive on noticing that both 
and £ p (9) are O p (n) and they differ by Q, which is O p (l), but it can also easily be proved 



formally, under th e above regu l arity c onditions, following essentially an argument similar to 



Theorem 5.2.2 of Sen & Singer (1993 ). We then conclude that 9 is consistent with asymptotic 
distribution 

y/n(6-0) ^N(0,i E {9)- 1 ), whenn^oo, (9) 

where ie(9) denotes the expected Fisher information for a single observation. A more inform- 
ative expression can be obtained by expanding £ p (9) from the point 9 as follows: 

= £'J9) 



i> p 0)+i;0)(§-e) + o(\\e-e\\) 
-Q'{e) + e;{d){6-e) + o{\\~d-d\ 



where £" denotes the matrix of second order derivatives of £ p . We can then write 



C{e)- 1 Q\e) + R 



(10) 



where the remainder R is of smaller order in probability than the leading term under the 
assumption of uniform local continuity of Q". Therefore 9 and 9 differ by O p (n~ l ). 

It is common practice to obtain standard errors for 9 via an approximation of its covari- 
ance matrix with the inverse of the observed information matrix —£"{9)~ 1 = 7o(0) -1 , say. 
Combining the fact 9 — 9 = O t 
approximation 

var< 



_1 ) with local continuity of Q"(6), we obtain the matching 
:{§} « -t'10)- 1 . (11) 



2.2 On the choice of Q 

The above formulation leaves an extremely wide set of options as for choice of the penalty 
function. One way for selecting Q, or nearly equivalently for selecting M{9) = —Q'(9), is to 



require that the first order term of the bias is eliminated. This is the route taken |Firth| ( |1993| ) 
where the requirement of bias reduction is adopted at the onset of the construction; see also 
further work by |Kosmidis & Firth] ( J2009] ). If we insert ±9 on the left-hand side of |(10)| and 
compute expected values, then the leading terms are 

bias(0) « bias(#) + I E (9)~ 1 E{M(9)} 

where Ie(9) = niE{9) is the expected information matrix and of course computation of the 
expected value of M{9) is void when Q does not depend on the data. On equating the left 
side of this expression to 0, we obtain the condition 

E{M(9)} = -I E (9)Uas(9) 

which must be completed by substitution of bias(#) with the first-order term of the MLE bias, 
given by|Cox & Snell| (|1968p. When M(9) does not depend on the data, we arrive at an 



estimating equation of type (3) 

One difficulty with the bias reduction criterion for selecting M is the technical difficulty 
of working out the explicit expression of bias(#). In the skew-normal case, only the one- 



parameter case leads to the relatively simple form (4) where however the coefficients a p (a 
do not have an explicit expression. 



Moreover, as reminded by |Kosmidis & Firth] ( |2009| ), "Point estimation and unbiasedness 
are, of course, not strong statistical principles. The notion of bias, in particular, relates to a 
specific parameterization of a model: for example, the unbiasedness of the familiar sample 
variance S 2 as an estimator of a 2 does not deliver an unbiased estimator of a itself". We 
agree with this view, and in the development to follow the requirement of unbiasedness will 
be taken into account but not in a prescriptive form. 



We conclude this section with a qualitatively motivated choice for Q in the case of a 
multivariate skew-normal distribution. It has repeatedly emerged that many salient features 
of the family [(7)1 depend on the parameters via only the scalar quantity 



a, = a Qa 



where $7 = lj 1 r2w Ms the correlation matrix associated to f2. The prominent role of a 2 
appears in a number of results of |Azzalini & Capitanio| (1999 ) and of Arellano-Valle & Azzalini 



(2008); in the latter paper the dependence is expressed indirectly via the monotonically 
related quantity /3 2 = 2 a 2 /{ir + (V - 2)a 2 }. 

It is then natural to introduce a function Q in (6) which depends on 6 only via a 2 . Com- 



bining this choice with the requirements |(8)| and the consideration that a logarithmic form of 
dependence would keep the modification of the original log-likelihood to a minimum also for 
diverging a*, we arrive at the formulation 



Q = c 1 log{l + c 2 ai) (12) 

where c\ and c 2 are positive constants. This is not yet a fully specified penalty function, but 
the set on alternative options is now greatly reduced. 

2.3 On the choice of Q in the skew-normal case 

We focus initially on the scalar skew-normal distribution. In this case Q and its first derivative 

take the form 

Q(a) = a log(l + c 2 a 2 ) , Q'{a) = 2 Cl c 2 ^ " „ . (13) 

1 + c 2 a z 



Note that approximation (5) of M{a) is of type —Q'(a) with c\ = 3tt 2 /32, c 2 = 8/ir 2 , but 
the intended use of Q(a) is not only for the one-parameter case to which [(5)| applies. 

We want to develop an alternative approximation to M(a) defined by |(4)[ The reason 
for this search is partly to obtain an approximation with stronger theoretical support and, 
more importantly, to explore a direction which can extended to the skew-t case which will be 
considered later. 

First, note that 02(a) and 04(a) are even functions of a. This fact has been proved for 02(a) 
by Liseo & Loperfido (2006ft but the proof extends immediately to any a p (a) with even p. 



Hence 02/04 depends on a only via a 2 . Next we observe that the numerical behaviour of 02/04 
is remarkably linear with respect to a 2 as shown by the left panel of Figure [3] which displays 
the value of 02/04 at 31 equally spaced points of a between and 10; the interpolating line 
will be described shortly. 

Therefore we approximate 02/04 by a function of the form ei + e2a 2 and we select ei and 
e 2 by matching 02/04 and e\ + e2a 2 at a 2 = and a 2 — > 00. To this end, re-write a p (a) as 



where X ~ N(0, 1) and S = 5(a) = a/Vl + a 2 . Hence 



"2(«)_ fl 2 ^{^ 2 Cl(^)} 

o 4 (a) l a J E{X 4 Ci(oX)} 



ei + e 2 a 2 




Figure 3: Left panel: values of 02(a) /04(a) for SN distributions, numerically evaluated at 
a grid of points, plotted versus a 2 and superimposed approximating line. Right panel: Q 
function obtained by numerical integration of—M(a) (continuous line) , by integration of its 



approximation (5) (dashed line) and by Q described in the text (dot-dashed line) 



leading to 



ei 



f'2 



02(0) = E{X 2 } _ 1 
04(0) ~ E{X 4 } ~ 3 ' 

1 + a 2 E{X 2 Ci(5X)} 



lim 



gll _ ^{X 2 Ci(X)} 

a 2 E{X 4 ( 1 (5X)} a 2 ]' E{X 4 Ci(X)} 



0.2854166 (14) 



where the final coefficient was obtained by numerical integration. The line plotted in the left 
panel of Figure [3] has intercept e\ and slope e 2 . 

The right panel of Figure [3] displays three curves: the continuous line is the curve obtained 
by numerical integration of —M(a) defined by (4) and it coincides up to the change of sign 



in 



w ith the continuous curve in Figure 1(b) of |Sartori] ( |2006| ); the dashed line is the Q func tion 
(13) with c% = 3 vr 2 /32, C2 = 8/71" 2 , corresponding to the integral of approximation (5) the 



dot-dashed line is the curve Q with coefficients 

d = 1/(4 e 2 ) « 0.875913, c 2 = e 2 /ei « 0.856250 , 

whose graph is barely distinguishable from the essentially exact continuous curve. 



(15) 



This choice of Q with coefficients |(15)| is motivated by the Sartori-Firth formulation for 
the case SN(0, l,a). However, we adopt the same penalty function more generally to the 
three-parameter case SN(£, oj 2 , a), since the motivation for introducing a penalization of the 
log-likelihood function came solely from its behaviour with respect to a. When this procedure 
is applied to the frontier data, the estimates of (£,w, a) are (—0.034, 1.165, 6.256), quite close 
to the true parameters (0,1,5) and also close to the Sartori values (—0.106,1.234,6.243), 
whose first two components coincide with the MLE. The graphical outcome of the MPLE is 



10 



represented by the dot-dashed curve in the left panel of Figure [TJ It is also worth mentioning 
that a shape parameter a = 6.256 corresponds to an index of skewness 71 = 0.899, very close 
to the sample index of skewness, 0.902. 



Consider now a d-dimensional skew-normal distribution (7) initially in the case of a 



location- and scale-free variable Z ~ SN^(0, Q,a) where U is a correlation matrix. Recall 



the canonical transformation Z* = A* Z introduced in Proposition 4 of Azzalini & Capitanio 



(1999), such that Z* has d independent components of which one (the first one, say) has 
distribution 

( T" ^ /2 

SN(0, l,a*), a* = [a Via 



and the other d — 1 components are N(0, 1). More specifically, an explicit expression of the 
transformation, given in the proof available in the full version of the paper, is 

Z* = (C- l P) J Z (16) 

where C is such that 

and P is an orthogonal matrix whose first column is proportional to Ca. We can then write 

Z* ~ SN d (0, I d , a z *), a z * = (a*, 0, . . . , 0) T . 

Assume now that a random sample z = (zi, . . . ,z n ) is drawn from Z ~ SNd(0, Cl,a). To 
estimate its parameters, we can proceed as follows. 

o The sample z can be converted into an equivalent sample z* = (z\, . . . , z*) drawn from 
Z* ~ SN d {0,I d ,a z *) on setting 

z * = ( C - 1 P) T z t (i = l,...,n). 

The determinant of the Jacobian is det(C T P) = det(C) = det(f2) 1/2 . 

o We now have a sample of size n from SN(0, l,a#) and d — 1 samples of size n from 
N(0, 1). For the first sample, we adopt the above-described scheme, hence the log- 



likelihood function is as on (2) with a replaced by a* and the penalty function is (12) 



with coefficients which can be taken as in |(15)[ 

o Now we can revert back the z* sample to the original z, for which we write the usual 
log-likelihood, except that in this process we have introduced the penalty factor for a. 
In conclusion the penalized log-likelihood is 

£ p (9) = 1(9) - c± log(l + c 2 al) 

n 

= ^(log0 d (z;;ft) + Co(a T ^)) -ci log(l + c 2 a 2 ). (17) 

The following remarks apply. First, the transformations from z to z* and then back to z are 
conceptual steps which serve only as an argument to introduce the penalty factor, and support 
its present form, and they do not need to be actually performed. Second, note that, although 
d does not appear explicitly in the penalty factor, it does have an effect since a 2 reflects d 

11 



indirectly, via the increase of the number of summands in its expression. As a simple example, 
take the case where a is a vector with d identical components ao and 0, = 1^, then a* = da^. 



Now move to the general case of (7) and consider a random sample y = (yi, ■ ■ ■ ,y n ) from 



Y ~ SNd(£, Q, a). By adapting the 1(9) term in (17) for the presence of location and scale 
parameters, we arrive at 

n 

£ p( 9 ) = Y, [ l °sMVi - & O) + (o(a T LO-\y t - &))] - c x log(l + c 2 al) (18) 

i=i 

where # T = (£ T , (vech$7) T ,a T ); here vech is the operator which stacks the lower triangle of 
a matrix in a vector. 

2.4 LRT-type statistic and another estimate 

Consider now the likelihood-ratio test (LRT) statistics in its standard version and the analog- 
ous one for the penalized log-likelihood | (6) [ that is 

W = W(9) = 2{£(9) - £(6)}, W p = W p (d) = 2{£ p 0) - £ p (0)} . 



From the results of Section 2.1 we can say that the null distribution of W p is of x tyP e > 
similarly to W. 



Both intuition and the results of Section 2.1 suggest that W and W p must be strongly 



associated. Some numerical exploration confirms this idea but it also exhibits that the type 
of dependence is quite peculiar. This is illustrated in the left plot of Figure [4] which refers to 
a set of 5000 samples of size n = 1000 from the distribution SN(0, 1, a) with a = 3; hence in 
this case 9 is a. To increase readability, the right plot of the same figure displays a subset of 
the earlier points over a reduced plotting area. The obvious feature of this figure is that the 
joint distribution of (W, W p ) is strongly concentrated along two branches. A closer inspection 
indicates that the top branch is made of points where both a and a underestimate the true 
a = 3, the bottom branch is composed by points where both a and a overestimate, and the 
darker points around in the bottom left corner are those where a — a and a — a take opposite 
signs. Note that the points with opposite signs of the estimation error are whose with smaller 
values of both W and W p , hence those where the estimation error is smaller is size. 

The pattern displayed in Figure [4] appears also in other simulation experiments. This 
behaviour, combined with the final remark of the previous paragraph, suggests an alternative 
estimate 9 defined as a solution of W = W p , written more explicitly as 

9 = {9 : W(0) - W p {9) = 0} (19) 

or equivalently 

9 = {9:Q(9) = q(y)}, q(y) = £(§) - 10) + Q{9) . (20) 

Note that q(y) > 0, with strict inequality if we exclude limiting cases. For the existence 
of this solution we need that (i) W p and W are finite, and (ii) there exist points of the 
parameter space with opposite signs of W p — W. In the context of skew-normal distribution, 
1(9) and £ p (9) are bounded, so condition (i) holds. As for condition (ii), it holds because of 
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Figure 4: Scatter plot of simulated values ofW(a) and W p (a) for samples of size n = 100 
from SN(0, 1, a) when a = 3. Samples where both and overestimate a are marked by grey 
diamonds, those where both underestimate are denoted by grey triangles, those with mixed 
signs are denoted by black circles. The right-side plot refers a subset of the sampled values 
and it shows the enlarged picture of a smaller area. The dashed line is the identity. 



the following facts: 



W(0) > 0, 

W p (0) = 0, 

W p {6) - W{0) < 0, 



W p (0) > 0, 
W{0) = 0, 
W p 0)-W0) >0. 



(21) 



In the multiparameter case, |(19) defines a surface in the parameter space, not a single 
point. In this case we complement (19) with the condition that 9 must lie on the segment 
joining 9 and 0. Since inequalities (21) ensure that and lie on opposite sides of the surface, 
this intersection point exists. From the computational viewpoint, can be located efficiently, 
via a one-dimensional search along this segment, irrespectively of the dimension of 9. 



When Q is chosen of the form |(12)| |(20)| takes a simple form, since the solution of 
c\ log(l + C2al) = q(y) corresponds to the equation of an ellipsoid, that is a T Cla = r(y), 
where r(y) = [e^)/ ci - l]/c 2 . 



2.5 Simulation study 

The above estimation methods have been studied via numerical simulations. The first study 
has considered samples from SN(0, 1, a) where a was the only parameter to be estimated, 
and the true value was a = 5. For each generated sample, four estimators have been 



computed: classical MLE, MPLE with penalty (13) and coefficients (15) the Sartori-Firth 
estimator defined by (3)|(4) the estimator defined by the condition W p 



W in (19) 



These estimates have been computer for 10 6 replicated samples, for each of sample sizes 
n = 50, 100, 250, 350, 500, 1000. 
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The final outcome is summarized graphically in the four panels of Figure [5] where the 
curves associated to the estimators are numbered 1 to 4. Four log-transformed summary 
quantities are plotted versus logn; they are: absolute bias (top left), standard deviation (top 
right), absolute median bias (bottom left), inter-quartile range (bottom right). The cases 
where a diverged have been excluded from the computation of these summaries, in agree- 
ment with |Sartori| ( |2006P and |Bayes & Branco| ( |2007P . 



The doubly logarithmic scale has been adopted for simplifying interpretation of the curves. 
This is especially so for the top left panel, since the bias is expected to decrease at rate n _1 
for the MLE, and at rate n~ 2 for the Sartori-Firth method and its close approximation MPLE. 
We do in fact observe that the slopes of these curves are close to —1 and —2 respectively. The 
top left panel confirms the theoretical expectations, displaying a clear improvement of SF and 



MPLE over MLE; the estimator | (1 9) | has a bias somewhat lower than MLE but decreasing at the 
same rate. MPLE and SF are very similar to each other, with only some discrepancy at the right 
end, with n = 1000, where the bias is very small indeed and the numerical approximation 



involved in evaluating the coefficients a p (a) in (4) may have perturbed slightly the exact 
implementation of the SF method. 

The message emerging from the bottom left panel, which plots the logarithm of the me- 



dian bias, is radically different. Estimator (19) is markedly preferable to the others, MLE is 



second best, and the other two are equivalent up to the point that their curves are super- 
imposed. For all the curves the slope is near to —1, which means ad the rate of decrease 
77T 1 for the median bias, and the differences are only in the intercepts. Median unbiasedness 
is not often considered in theoretical work, presumably because it is a more difficult aspect 
to evaluate compared to mean unbiasedness; it has however the important advantage over 
mean unbiasedness that it is preserved under monotone parameter transformation. 

The two right-side panels convey very similar messages as for variability of the four con- 
tenders: SF and MPLE have the smallest variability, both on the scale of standard deviation 
and of the inter-quartile range, and they are essentially equivalent to each other with super- 



imposed curves; MLE has the largest variability and estimator (19) sits in between the others. 
Differently from the left-side plots, however, here the differences vanish as n diverges, and 
they are effectively almost negligible from n = 250 onwards. 

The usual combination of bias and variance is given by the mean square error, or by its 
square root. If the logarithm of either of them is plotted versus log n, the graphical appearance 
is virtually identical to the top right panel of Figure [5] If comparisons are based on moment- 
based quantities, the overall indications are then that (i) MPLE and SF are preferable to the 
others, at least for small and moderate n, (ii) MPLE and SF are effectively equivalent. It is 
less clear how to combine the two bottom panels into a single summary plot, but we note 
that also here the vertical scale of right-side panel has a larger order of magnitude than the 
left-side panel; hence its behaviour would dominate in any reasonable combination of the 
two. 

In a second set of simulations, the data have still been sampled from a skew-normal dis- 
tribution but now all three parameters are regarded as unknown, so that the reference set 
of distributions is of type SN(£,o; 2 ,a). To ease comparisons, the parameter values and the 
sample sizes have been taken to be the same of Table 2 of |Sartori (2006), that is £ = 0, u = 1, 



a = 5, 10, with sample sizes n = 50, 100, 200, but here a substantially larger number of replic- 
ates has been generated for each sample size, namely 10 5 . In all cases the location and scale 
parameters were £ = and u = 1. Table [T] reports the summary values for three estimators, 
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Figure 5: Simulation study of the distributions of four estimators for samples of size 
n = 50, 100, 250, 350, 500, 1000 from SN(0, 1, a) when a only is estimated and the true value 
is a = 5; for each sample size 10 6 replicates have been generated. The four estimators are 
1:MLE, 2:MPLE, 3:SF, 4:W P = W, but on three of the four panels MPLE and SF are graph- 
ically coincident. Top left panel: log |bias|, top right: log standard deviation, bottom left: 
log |median bias|, bottom right: log IQR; in all case the horizontal axis represents log n. 
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Table 1: Summary quantities of the distribution of MLE, MPLE and (19) estimating the 
parameters (£,o/, a) of a distribution SN(0, l,a) when a = 5, 10, based on a sample of size 
n = 50, 100, 200. Aii entries are based on 10 5 replicated samples. 



S 



€ 



a 



5 50 


mean bias 


0.0235 


-0.0209 


1.472 


0.1740 


-0.1252 


-1.411 


1.808 




median bias 


0.0028 


-0.0160 


0.124 


0.0884 


-0.1023 


-1.726 


-0.610 




std. dev. 


0.1502 


0.1411 


4.974 


0.2661 


0.1677 


2.600 


7.641 




P{a = oo} 






0.139 










100 


mean bias 


0.0060 


-0.0075 


1.442 


0.0534 


-0.0499 


-0.456 


0.866 




median bias 


0.0002 


-0.0066 


0.263 


0.0379 


-0.0456 


-0.872 


-0.306 




std. dev. 


0.0839 


0.0959 


4.896 


0.1118 


0.1011 


2.320 


5.615 




P{d = oo} 






0.023 










200 


mean bias 


0.0029 


-0.0036 


0.592 


0.0216 


-0.0222 


-0.195 


0.197 




median bias 


0.0005 


-0.0035 


0.150 


0.0181 


-0.0216 


-0.410 


-0.152 




std. dev. 


0.0544 


0.0656 


2.508 


0.0558 


0.0655 


1.540 


2.254 




P{d = oo} 






0.001 










10 50 


mean bias 


0.0225 


-0.0235 


0.788 


0.1249 


-0.1049 


-3.992 


3.497 




median bias 


0.0076 


-0.0210 


-1.001 


0.0685 


-0.0898 


-4.586 


-1.070 




std. dev. 


0.0910 


0.1213 


6.874 


0.2038 


0.1461 


3.769 


11.775 




P{d = oo} 






0.345 










100 


mean bias 


0.0047 


-0.0069 


3.274 


0.0393 


-0.0418 


-1.462 


3.591 




median bias 


-0.0004 


-0.0067 


0.527 


0.0299 


-0.0404 


-2.481 


-0.638 




std. dev. 


0.0560 


0.0830 


9.399 


0.0677 


0.0845 


4.580 


12.840 




P{a = oo} 






0.129 










200 


mean bias 


0.0015 


-0.0025 


2.575 


0.0170 


-0.0190 


-0.422 


1.583 




median bias 


-0.0026 


-0.0026 


0.563 


0.0140 


-0.0188 


-1.275 


-0.360 




std. dev. 


0.0374 


0.0581 


8.072 


0.0375 


0.0574 


4.160 


8.738 




P{a = oo} 






0.021 











that is MLE, MPLE and (19) ; for the last one, only the estimate of a is given, since the first 
two components of the estimate coincide with those of MPLE. 

Inspection of the Table [T] confirms the improvement of 5 over a. and its overall similarity 
of the bias of a with the analogous entry in Table 2 of|Sartori] (|2006]) . Notice that, a. and a are 



now no longer essentially coincident as they were in the one-parameter case. In interpreting 
the bias and standard deviation of a one must bear in mind that this have been computed 
excluding the case with diverging estimates; in practical term, we have taken |d| > 100 
as an indication of a diverging estimate. The exclusion of the diverging estimates has a 
relevant effect especially in the present setting where the probability of this event appears to 
be appreciably higher than in the one-parameter case; for instance with n = 50 and a = 5 
this probability is now about 0.139, while it is only 0.039 in the one-parameter case examined 
earlier. Bias and variability of £ and Co are somewhat higher than those of the MLE's, £ and Co, 
at least for n = 50 and to some extent for n = 100. It is then conceivable to adopt £ and ui as 
estimates of location and scale, and a for estimating shape, similarly to the strategy of Sartori. 
This choice is advantageous as for formal properties, but it has the logical drawback of the 
lack of a unique estimation criterion. Also in this setting, the estimate a has low median bias, 
but this is paid by a substantial increase in variability. 
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3 Extension to the skew-£ distribution 

A distribution closely related to the skew-normal is the skew-t whose density in the scalar 
case is 

lt(^u)T(a^iJ^;u + l), * € R, (22) 

u> \ co J \ w V v + Qx J 

where £(•; v) is the Student's t density with v > degrees of freedom, T(-; v + 1) is the t 
distribution function for u + 1 degrees of freedom and Q x = u~ 2 (x — £) 2 ; here v is a positive 
value which can be non-integer. This distribution allows regulation of the tail thickness via 
the additional parameter v. If a continuous random variable Y has density function (22) we 
write y ^ST(^uj 2 ,a,u). 

Initial work for applying the method of Firth to the case of a skew-t distribution has been 
done by |Sarto ri (2006), who has considered estimation of a when the other parameters are 
known. For the three-parameter case (£, uj, a) with known v, Sartori has proposed a two-step 
procedure, similarly to the skew-norma l case. This direction has been explored further by 



Lagos Alvarez & Jimenez Gamero (2012) who have shown that the corresponding estimating 



equation is of the same form M(a) = as in (4) with a,2 and a^ replaced by suitably modified 



expressions which depend on v besides a. Moreover, they prove that M(a) = always admits 
a finite solution when v > 2. 

In the ST case, we proceed similarly to the SN case to obtain a close approximation of 
the M(a) function. The plot on the left panel of Figure [6] plays a similar role of the left 
plot in Figure |3j except that we now have a sequence of points for each chosen values of v, 
specifically v = \, 1, 2, 5, 10, 50, with different plotting symbols for each value of v. Also in 
this case there is a striking alignment of the points referring to the same v, particularly so if 
we consider the wide range of v values considered. 

For any fixed value of v, we approximate a/ [—2 M(a)] by a function of the form 

a 2 

~ ei v + e 2y a 



-2M(o) 

whose coefficients e\ v and &2v are obtained by matching the behaviour of the two sides at 
a 2 = and a 2 — > oo. After some algebraic work summarized in an appendix, we arrive at the 
expressions 



en 



3 



z-iv = gl 



E{X 2 Ci(X i; z, + l)} (23) 



[xl Ci (x 3X /(^TiW+3); v + 1) } 



E 



where 



9» = ,.:.\,2 > Ci(x; v) = ^E^ ■ (24) 



(i/ + 2)(^ + 3) . , _ t(x;u) 

(y + \y T(x;u) 

and Xk ~ tv+k- The two expected values involved by e2u must be evaluated numerically. 
The dashed lines in the left panel of Figure [6] have intercepts and slopes given by (23) ; it is 
apparent that the lines interpolate the points described earlier almost exactly. 



Again, the integral of —M(a) is then closely approximated by |(13) with coefficients c\ 



1/(4 &2v) and c 2 = e2 V /e\ v . The right panel of Figure [6] displays the Q function obtained 
by numerical integration of — M(a) as continuous lines, and their approximation of the form 
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Figure 6: Left panel: values of a / [—2 M (a)] for ST distributions, for v = |, 1,2,5, 10, 50, 
numerically evaluated at a grid of points, plotted versus a 2 and superimposed approximating 
line for each selected v. Right panel: Q function obtained by numerical integration of—M(a) 
(continuous lines), and by Q described in the text (dot-dashed lines). 



(13) as dot-dashed lines. The two set of lines are in fact virtually indisti ngui shable. Therefore 



this choice of Q is essentially equivalent to the one of 



Sartori 



(2006) and Lagos Alvarez & 



Jimenez Gamero (2012 ) in the case ST(0, 1, a, v) when v is regarded as known and only a is 
estimated, but it has the computational advantage of avoiding the numerical evaluation of a 
very large number of integrals, which are required when the numerical algorithm for solving 
M(a) = searches over a range of a values. 

At variance from the above-quoted authors, we adopt the penalty function just described 
also in the four parameter case ST(£,o; 2 ,a,i/), analogously to what we did for the three- 
parameter SN case. A point of practical concern in this process is that, when u is not fixed, 
the algorithm for numerical optimization of £ p (9) visits many candidate values of v, and each 
of them involves numerical evaluation of the two integrals involved by e2 V in |(23)| To avoid 



these extensive integrations, we have explored a simple empirical approximation of e2 U - 

Some numerical exploration has show that log(e2iV e 2 _ 1) is very nearly a linear function of 
log(z/ + 7) where 7 = 0.57721 ... is the Euler's constant, and e2 is the limiting value|(14)| This 



near linearity is visible in the left plot of Figure [7]which displays a set of numerically evaluated 
values e2v, for a range of degrees of freedom from v = 0.25 to v = 250, transformed to 
log(e2y/e2 ~~ 1) an d plotted versus log(z/ + 7). The interpolating line fitted by least squares has 
intercept 1.37 and slope —1.00 when rounded to two decimal places; these are the coefficients 
of the plotted line. In the right-side plot of the figure, the interpolation has been transformed 
back on the original scale, so that the continuous line superimposed to the points (v, e2 U ) is 
the approximation 

( 4 

&2v ~ e 2 H ■ — 

V v + l 
which appears to work well. Clearly the other coefficient, e\ v = g v /3, poses no problem. 
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log(v + y) 



Figure 7: Exact values of e2 V and their approximating function on the transformed scale (left 
side plot) and on the original scale (right side plot). 

In the multivariate case we proceed similarly to the skew-normal case, and consider a 
penalized log-likelihood function similar to (18)| but the coefficients c\ and C2 now depend 
on v, and the summands of the first term are replaced by the logarithm of the multivariate 
skew-t density function. This density is given by 



2t d {x-i- ) n i v)T 



1 d + v 

Qx + V 



a ijj (x — £); v + d 



x e 



(25) 



where td(x; £1) denotes the d-dimensional Student's t density function with location 0, scale 
matrix £} and v degrees of freedom; the earlier definition of Q x is now replaced by Q x = 

(x-o T n-\x-£). 
4 Discussion 

For the SN and ST distributions in the univariate and multivariate cases, we have examined 
a methodology which avoids the problem of estimates on the frontier of the parameter space 
which can occur with maximum likelihood estimation. The problem of estimates on the 
frontier is vanishing when the sample size diverges, but in practice, for small to moderate 
sample sizes, it arises with non-negligible probability and it has disturbing effects on the 
inferential process. 

The present proposal is, in a way, closely connected with existing results, but there are 
differences. One is that the focus here is shifted from unbiasedness to the use of a penalty 
function Q which can be chosen quite freely once a few requirements are satisfied. This is the 
basis of another difference from existing work: we have adopted the Q function arising from 
the one-parameter cases, SN(0, 1, a) and ST(0, 1, a, v) with fixed v, as the starting point for 
the choice of Q in more complex situations, that is multiparameter and multivariate settings. 
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There are various directions in which the present work can be extended, of which we 
mention a few. 

o As it stands, the methodology presented here is applicable to the skew-normal and the 
skew-t distributions, which are the two most commonly employed families from the 
broader set of skew-elliptical distributions. However the formulation could be adapted 
to other skew-elliptical families. 



o The asymptotic results of Section 2.1 are all of first-order type. There is wide room for 



higher asymptotic theory; accuracy of approximation (11) is of special interest 



o Alternative choices of the penalty Q could be considered provided the new function 
satisfies conditions |(8) and those indicated shortly thereafter. 



While these developments are interesting, the proposal at its present stage provides an 
already workable and quite general way to overcome what appears to us as the last obstacle 
to the systematic use of skew-normal and skew-t distributions in routine statistical work. 

Before closing, it is perhaps useful to point up that the adoption of the penalized likelihood 
formulation is compatible with the adoption of the centred parameterization mentioned in 
the introductory section as a tool to overcome the singularity of the information in the skew- 
normal case. The two mechanisms are conceptually distinct and they can coexist. Once 
the MPLE of the direct parameters have been obtained, they can be transformed into the 
centred parameter space, and the variance matrix of the MPLE estimates can be converted 
via the known Jacobian matrix of the transformation (Arellano-Valle & Azzalini 2008). For 



the skew-t distribution the problem of singular information matrix at a = does not arise for 



any < v < oo, as proved in Proposition 1 of Arellano-Valle (2010) 
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Appendix: Limiting behaviour of M(a) in the ST case 

Consider a random variable Z ~ ST(0, 1, a, v) and its transformation W = v(Z) Z where 



v(z) 



V + z 2 
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From |Sartori| ( |2006| ), we write 

I' (a) = Ci(aW;u + l)W, 
v + 1 



I" (a) 



l'(af + I' (a) I" {a) 



v + 2 
v + 1 



1 



a 2 W 2 ^- 1 



v + 1 



2 ixr2 



(i(aW; v + 1) W A - d(aW; v + lfW 



,,.1 + ^r aCi(a^;^ + l) 2 W 4 . 
i/ + 2 V v + 1 J 



where (i(x; v) is the function defined by (24) Let now Xj, ~ t(0, 1; v + k), and define the 
random variables 



Vk& 



v + 1 



v + k + {l-5 2 )Xl 



After some simple algebraic manipulations, where we use the relation 



t(z; v)t I az\\ — : — ^; v + 1 



V + z 2 ' 



t(0-v] 



v+1 / v+1 



V + Z' 



-1/2 



V + 1 



Vl+~a 2 z;v + l) , 



we obtain 



E{l"(a)} = -E{Ci(aW;v + l) 2 W 2 } 



.(l + a 2)-3/2 6j JV+[ TT[V , 



V 



E{l'{af}+E{l'{a)l"{a)} 



-"'Si^^n 11 -! 



E{X 2 V 1 sC 1 (5X 1 V ls ;v + l)} 



o^W 2 ^- 1 



M i+ a 2 r^K^f 3 



v + 1 



v+1 



v + 3 \v + 2 J \v + 3 
E{XIV 3 sCi(5X 3 V 3 s;v + l)} 

where b v = 2£(0; v). Thus, using the Sartori-Firth formulae for M(a), we write 

E{l'{a) 3 }+E{l'{a)l"{a)} 



M{a) 



-2E{l"{a)} 



a 
2 



v+1 
v+2 



E Ci(aiy ;i / + 1) 2 1 + 



ofW 2 ^- 1 



v + 1 



W 4 



E{Ci(aW;v + l) 2 W 2 } 

2 



(l + a 2 )-^ 2 b Ujfe £g (m E{XlV 3S Ci(SX 3 V 35 ;v + l)} 



2(l + a 2 )-y 2 b 1 ^ I E{X 2 V 15 Ci(SX 1 V ls -v + l)} 



Note that the second expression agrees with a matching one of Lagos Alvarez & Jimenez 



Gamero| ( |2012| ) . From the last expression, we obtain 

= (1 + a 2 ) 



n 



, u + 2\ 2 (v + 3\ i,2 E{X 2 V 15 Ci{5X l V 35] v + l)} 



2M(a) 



+ 1) \v + l) E{xlV 3 sC 1 (5X 3 V 35 ;v + l)} 



e\v + e^voc 
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Thus, by noting that for a 2 



\ (2r+l)/2 



«wm - ttJM^h) E ^>> 



K+kY " + 1 fv + k\r[(v + k + l-2r)/2](2r)\ 



we arrive at 



eiv 



v + k\ 2 ) T[{y + k + l)/2] 2 r r\ ' 



z^ + 2\ 2 /i/ + 3\ 3/2 E{X?Fi } 



^ + 1/ v^ + iy e{x|v 30 } 



i a+A 2 ^ + 2 vi 



3 V b^+3 7 V ^ + 1 



while 



e2y = lim 



1 + a 2 /^ + 2\ 2 /^+3\ 3/2 El^VigCi^Xx^^ + l)} _ ei, 
a 2 U + V U + V E{X 3 4 V3 5 Ci (^3^35^ + 1)} a 2 



z/ + 2\ 2 /z/ + 3\ 2 E{X 2 Ci(^i;^ + l)} 



z/ + iy \y + l) E{x|Ci(v^+l^ 3 /V^+3;i/ + l)}" 
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